854 research outputs found

    Extending the mutual information measure to rank inferred literature relationships

    Get PDF
    BACKGROUND: Within the peer-reviewed literature, associations between two things are not always recognized until commonalities between them become apparent. These commonalities can provide justification for the inference of a new relationship where none was previously known, and are the basis of most observation-based hypothesis formation. It has been shown that the crux of the problem is not finding inferable associations, which are extraordinarily abundant given the scale-free networks that arise from literature-based associations, but determining which ones are informative. The Mutual Information Measure (MIM) is a well-established method to measure how informative an association is, but is limited to direct (i.e. observable) associations. RESULTS: Herein, we attempt to extend the calculation of mutual information to indirect (i.e. inferable) associations by using the MIM of shared associations. Objects of general research interest (e.g. genes, diseases, phenotypes, drugs, ontology categories) found within MEDLINE are used to create a network of associations for evaluation. CONCLUSIONS: Mutual information calculations can be effectively extended into implied relationships and a significance cutoff estimated from analysis of random word networks. Of the models tested, the shared minimum MIM (MMIM) model is found to correlate best with the observed strength and frequency of known associations. Using three test cases, the MMIM method tends to rank more specific relationships higher than counting the number of shared relationships within a network

    Truth, Probability, and Frameworks

    Get PDF
    Yeshttp://www.plosmedicine.org/static/editorial#pee

    Data-Mining Analysis Suggests an Epigenetic Pathogenesis for Type 2 Diabetes

    Get PDF
    The etiological origin of type 2 diabetes mellitus (T2DM) has long been controversial. The body of literature related to T2DM is vast and varied in focus, making a broad epidemiological perspective difficult, if not impossible. A data-mining approach was used to analyze all electronically available scientific literature, over 12 million Medline records, for “objects” such as genes, diseases, phenotypes, and chemical compounds linked to other objects within the T2DM literature but were not themselves within the T2DM literature. The goal of this analysis was to conduct a comprehensive survey to identify novel factors implicated in the pathology of T2DM by statistically evaluating mutually shared associations. Surprisingly, epigenetic factors were among the highest statistical scores in this analysis, strongly implicating epigenetic changes within the body as causal factors in the pathogenesis of T2DM. Further analysis implicates adipocytes as the potential tissue of origin, and cytokines or cytokine-like genes as the dysregulated factor(s) responsible for the T2DM phenotype. The analysis provides a wealth of literature supporting this hypothesis, which—if true—represents an important paradigm shift for researchers studying the pathogenesis of T2DM

    Predicting gene ontology from a global meta-analysis of 1-color microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Global meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance.</p> <p>Results</p> <p>13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision.</p> <p>Conclusions</p> <p>Of the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.</p

    Systematic classification of non-coding RNAs by epigenomic similarity

    Get PDF
    BACKGROUND: Even though only 1.5% of the human genome is translated into proteins, recent reports indicate that most of it is transcribed into non-coding RNAs (ncRNAs), which are becoming the subject of increased scientific interest. We hypothesized that examining how different classes of ncRNAs co-localized with annotated epigenomic elements could help understand the functions, regulatory mechanisms, and relationships among ncRNA families. RESULTS: We examined 15 different ncRNA classes for statistically significant genomic co-localizations with cell type-specific chromatin segmentation states, transcription factor binding sites (TFBSs), and histone modification marks using GenomeRunner (http://www.genomerunner.org). P-values were obtained using a Chi-square test and corrected for multiple testing using the Benjamini-Hochberg procedure. We clustered and visualized the ncRNA classes by the strength of their statistical enrichments and depletions. We found piwi-interacting RNAs (piRNAs) to be depleted in regions containing activating histone modification marks, such as H3K4 mono-, di- and trimethylation, H3K27 acetylation, as well as certain TFBSs. piRNAs were further depleted in active promoters, weak transcription, and transcription elongation regions, and enriched in repressed and heterochromatic regions. Conversely, transfer RNAs (tRNAs) were depleted in heterochromatin regions and strongly enriched in regions containing activating H3K4 di- and trimethylation marks, H2az histone variant, and a variety of TFBSs. Interestingly, regions containing CTCF insulator protein binding sites were associated with tRNAs. tRNAs were also enriched in the active, weak and poised promoters and, surprisingly, in regions with repetitive/copy number variations. CONCLUSIONS: Searching for statistically significant associations between ncRNA classes and epigenomic elements permits detection of potential functional and/or regulatory relationships among ncRNA classes, and suggests cell type-specific biological roles of ncRNAs

    eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications

    Get PDF
    Authors, editors and reviewers alike use the biomedical literature to identify appropriate journals in which to publish, potential reviewers for papers or grants, and collaborators (or competitors) with similar interests. Traditionally, this process has either relied upon personal expertise and knowledge or upon a somewhat unsystematic and laborious process of manually searching through the literature for trends. To help with these tasks, we report three utilities that parse and summarize the results of an abstract similarity search to find appropriate journals for publication, authors with expertise in a given field, and documents similar to a submitted query. The utilities are based upon a program, eTBLAST, designed to identify similar documents within literature databases such as (but not limited to) MEDLINE. These services are freely accessible through the Internet at http://invention.swmed.edu/etblast/etblast.shtml, where users can upload a file or paste text such as an abstract into the browser interface

    Suppression of ILC2 differentiation from committed T cell precursors by E protein transcription factors

    Get PDF
    Current models propose that group 2 innate lymphoid cells (ILC2s) are generated in the bone marrow. Here, we demonstrate that subsets of these cells can differentiate from multipotent progenitors and committed T cell precursors in the thymus, both in vivo and in vitro. These thymic ILC2s exit the thymus, circulate in the blood, and home to peripheral tissues. Ablation of E protein transcription factors greatly promotes the ILC fate while impairing B and T cell development. Consistently, a transcriptional network centered on the ZBTB16 transcription factor and IL-4 signaling pathway is highly up-regulated due to E protein deficiency. Our results show that ILC2 can still arise from what are normally considered to be committed T cell precursors, and that this alternative cell fate is restrained by high levels of E protein activity in these cells. Thymus-derived lung ILC2s of E protein-deficient mice show different transcriptomes, proliferative properties, and cytokine responses from wild-type counterparts, suggesting potentially distinct functions

    Ethnicity-specific epigenetic variation in naïve CD4+ T cells and the susceptibility to autoimmunity

    Get PDF
    Abstract Background Genetic and epigenetic variability contributes to the susceptibility and pathogenesis of autoimmune diseases. T cells play an important role in several autoimmune conditions, including lupus, which is more common and more severe in people of African descent. To investigate inherent epigenetic differences in T cells between ethnicities, we characterized genome-wide DNA methylation patterns in naïve CD4+ T cells in healthy African-Americans and European-Americans, and then confirmed our findings in lupus patients. Results Impressive ethnicity-specific clustering of DNA methylation profiling in naïve CD4+ T cells was revealed. Hypomethylated loci in healthy African-Americans were significantly enriched in pro-apoptotic and pro-inflammatory genes. We also found hypomethylated genes in African-Americans to be disproportionately related to autoimmune diseases including lupus. We then confirmed that these genes, such as IL32, CD226, CDKN1A, and PTPRN2 were similarly hypomethylated in lupus patients of African-American compared to European-American descent. Using patch DNA methylation and luciferase reporter constructs, we showed that methylation of the IL32 promoter region reduces gene expression in vitro. Importantly, bisulfite DNA sequencing demonstrated that cis-acting genetic variants within and directly disrupting CpG sites account for some ethnicity-specific variability in DNA methylation. Conclusion Ethnicity-specific inherited epigenetic susceptibility loci in CD4+ T cells provide clues to explain differences in the susceptibility to autoimmunity and possibly other T cell-related diseases between populations.http://deepblue.lib.umich.edu/bitstream/2027.42/116042/1/13072_2015_Article_37.pd
    corecore